Construction of Semantic Collocation Bank Based on Semantic Dependency Parsing
نویسندگان
چکیده
Collocation has always been an important issue in language research, especially in Chinese language researches. Chinese is an isolated language, which lacks morphological changes.Establishing a relatively complete dictionary of Chinese collocation will be a great contribution to Chinese study and research. Collocation plays a significant supporting role in many fields of NLP, such as information retrieval, machine translation, information extraction, and so on. Ding and Bai proposed a method of query expansion based on local co-occurrence [1] ; Lin put relationship ofcollocation into language model for query expansion, which got over the deficiency of insufficient relationShips caused by lacking context in tradition query [2] . In the basic research field of NLP, such as syntax, semantics, etc., collocation also plays an important role.Based on the comparison of different patterns in adjective collocation between the Chinese English learners and native speakers, Zhang analyzed the typical characteristics of different learners when using adjective collocations [3] ; Xingemphasized on the importance of collocation in the second language learning [4] . The early research of automatic collocation extraction was made by Choueka, Klein and Neuwtiz,they defined collocation as adjacent words, and used co-occurrence frequency to extract collocation [6] ;Church and Hanks improved the automatic extraction technology and put forward mutual information as the index ofcollocationevaluating [7] .By proposing a formula for calculating strengthbetweencollocation,introducing dispersion formula,as well as integrating with the automatic speech tagging technology, the Xtract system of Smadja improved the extraction accuracy rate of collocation extraction up to 80% [8] ; Lin extracted collocation based on shallow syntactic parsing [9] ;Shouxun YANG applied the method of decision tree to extract collocation by integrating frequency, likelihood ratio, point mutual information, variance and other statistical indicators [10] . In China, there werea number of outstanding dictionaries had been published, PACLIC 29
منابع مشابه
برچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملThe Impact of L2 Semantic Tasks (L2 Collocation versus L2 Definition) on Iranian Intermediate EFL Learners’ Vocabulary Achievement
This study investigated the relationship between teaching L2 semantic tasks (collocation vs. definition) in vocabulary achievement of Iranian intermediate EFL learners. To this end, 60 students at intermediate level studying in the Simin Institute were selected from a total number of 100 participants based on their performance on Oxford Placement Test. After ensuring the criterion of homogeneit...
متن کاملApplying Collocation Segmentation to the ACL Anthology Reference Corpus
Collocation is a well-known linguistic phenomenon which has a long history of research and use. In this study I employ collocation segmentation to extract terms from the large and complex ACL Anthology Reference Corpus, and also briefly research and describe the history of the ACL. The results of the study show that until 1986, the most significant terms were related to formal/rule based method...
متن کاملDomain Specific Automatic Question Generation from Text
The goal of my doctoral thesis is to automatically generate interrogative sentences from descriptive sentences of Turkish biology text. We employ syntactic and semantic approaches to parse descriptive sentences. Syntactic and semantic approaches utilize syntactic (constituent or dependency) parsing and semantic role labeling systems respectively. After parsing step, question statements whose an...
متن کامل